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TITLE OF THE INVENTION 
WORD STRING COLLATING APPARATUS, WORD STRING COLLATING 
METHOD AND ADDRESS RECOGNITION APPARATUS 

CROSS-REFERENCE TO RELATED APPLICATIONS 
This application is based upon and claims the 
benefit of priority from the prior Japanese Patent 
Application No. 2000-102370, filed April 4, 2000, 
the entire contents of which are incorporated herein 
by reference. 

BACKGROUND OF THE INVENTION 
This invention relates to a word string collating 
apparatus and word string collating method for 
collating a word string such as an address with 
addresses in an address dictionary when the word string 
is extracted from a character recognition result which 
may contain an error in the character recognition field 
in which a document inputting apparatus or an optical 
character reading apparatus for reading address 
information is used, for example, and an address 
recognition apparatus for recognizing the address. 

For example, an apparatus for extracting only a 
word string of an address from a word string containing 
a destination address name, honorific title and the 
like written on an envelop by collating the word string 
with addresses in an address dictionary is proposed. 

As this type of word string collating apparatus, 
an address collating apparatus for collating words 
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based on a distance between words, for example, an 
apparatus for effecting a word collating process based 
on an edition distance (Levenshtein distance, LD, V. 
Levenstein ; Sov. Phys. Dokl. 10, 707) or the like is 
known in the art. The outline thereof is to derive 
distances (similarities) between an input word string 
and words in the address dictionary and effect the 
address collating process by using the derived 
distances as the measurement when a word string is 
extracted from a set of input character recognition 
results containing errors. 

The prior art technique is explained in detail 
below. 

FIG. 1 shows the construction of an address 
collating apparatus used as the conventional word 
string collating apparatus. In FIG. 1, an input 
section 1 converts a document image into a form which 
can be processed by a computer when receiving the 
document image (for example, it is a photoelectric 
conversion device such as an image scanner) . 

A character recognition section 2 performs 
processes such as the binary coding process, 
segmentation process and individual character 
recognition process for understanding the contents of 
the document image based on the input document image. 
In this case, a recognition result obtained in the 
character recognition section 2 is hereinafter referred 



G 



C 



- 3 - 

to as a character recognition result. The character 
recognition technique has been studied for a long 
period of time, but a system capable of attaining the 
character recognition rate of 100% without fail cannot 
be realized except some restricted cases. Therefore, 
it is required in practice to provide means for 
correctly extracting a word string even if the 
character recognition result contains an error. 

A word string forming section 3 forms a word 
string A based on the character recognition result in 
the character recognition section 2 and stores the same 
into a memory Ml. The word string A is a set of 
character strings segmented in the unit of word. 

For example, the word string A constructed by 15 
words of "JOHN", "WILLIAMS", "MULTIPLE" , "DLSTRICT", 
"C", "1278", "SHEIATON " , "STREEI" , "UNLT", "5", 
"RICHRTIONDHILL" , "ONTARLO", "L4B" , "2N1" and "CANADA" 
are formed as shown in FIGS. 3 and 4 based on the 
address of FIG. 2. 

In an address dictionary M2 used as the word 
dictionary, a plurality of address data items (words) 
Bl, B2, ••• are previously stored and desired data items 
can be read out at any time. 

For example, as shown in FIG. 3, the address data 
Bl including six word items of the street name 
"WILLIAMS", street suffix "STREET", city name 
" RICHMONDH ILL " , state name "ONTARIO", zip code (upper 
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three digits) "L4B" and zip code (lower three digits) 
"2N1" is read out. 

Further, as shown in FIG. 4, the address data B2 
including six word items of the street name "SHERATON", 
street suffix "STREET", city name "RICHMONDHILL" , state 
name "ONTARIO" , zip code (upper three digits) "L4B" and 
zip code (lower three digits) "2N1" is read out. 

A distance calculating section 11 calculates a 
distance Clq between words by use of the word string A 
and address data Bl and stores the distance in a memory 
M4 . The distance C^p between the words can be 
variously defined and an edition distance (which is 
also called a Levenshtein distance and is hereinafter 
simply referred to as LD) is given as one example 
thereof. LD indicates the minimum value of the number 
of operations of replacement, insertion and deletion of 
characters required for converting the word string A 
into the other word string Bl . The operation is 
expressed by the following equation. 

L D (A,B1) = min {pa(i) + qb(i) + rc(i)} 
where a(i) indicates a certain number of replacing 
operations, b(i) indicates a certain number of 
insertion operations, and c(i) indicates a certain 
number of deletion operations. Further, p, q, r are 
weighting factors used for the edition operation of 
replacement, insertion and deletion and depend on 
appearing characters. Generally, since the number of 
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combinations of a(i), b(i), c(i) is limitless, the 
minimum value of LD (A, Bl) is derived by use of the 
dynamic programming method (Dp) . 

An optimum solution deriving section 12 selects 
5 one of a plurality of address data items Bl, B2, •■• 

which has the minimum distance C^q with respect to the 
word string A and provides the selected address data as 
the optimum solution. 

An output section 10 converts the thus acquired 
10 optimum solution into a form which the user can 

understand and outputs the thus converted address data 
and is a display device, for example. 

Conventionally, since only the distance 
(similarity) Clq between the words is used to perform 
15 the address collating process, there occurs a 

possibility that erroneous address data is selected as 
the optimum solution rather than correct address data. 
The operation is explained with reference to FIGS. 2, 
3, 4. 

20 FIGS. 3 and 4 show address collating methods based 

on the conventional method by taking an address 
(imaginary) in Canada as an example. An input document 
image is shown in FIG. 2. In this example, "John 
Williams/Multiple District C/1278 Sheraton Street Unit 

25 5/Richmondhill ONTARLO L4B 2N1 CANADA" is written. The 

result obtained by processing the document image by use 
of the character recognition section 2 and word string 
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forming section 3 is a word string containing a 
character error. As described before, a character 
error is contained in the word string. 

In this case, the recognized characters are all 
5 converted into capital letters (no distinction between 

capital letters and small letters) . As shown in 
FIGS. 3 and 4, the word string containing the character 
error is " JOHN-WILLIAMS-MULTIPLE-DLSTRICT-C-1278- 
SHEIATON-STREEI-UNLT-5-RICHRTIONDHILL-ONTARLO-L4B-2N1- 

10 CANADA" . 

In the address dictionary M2, a plurality of 
address data items Bl, B2, ••• are previously stored. 
In order to simplify the explanation, only two address 
data items including the first address data Bl 

15 "WILLIAMS-STREET-RICHIMONDHILL-ONTARIO-L4B-2N1" as 

shown in FIG. 3 and the second address data B2 
"SHERATON-STREET-RICHIMONDHILL-ONTARIO-L4B-2N1" as 
shown in FIG. 4 are provided. The items in each of the 
address data items seguentially and respectively 

20 indicate the street name, street suffix, city name, 

state name, postal code (upper three digits) and postal 
code (lower three digits) from the head portion. 

The distance calculating section 11 compares the 
word string A with the first address data Bl and the 

25 second address data B2. The method is to derive a word 

having the minimum distance (maximum similarity) for 
each item in the address data Bl (B2) . In the case 
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shown in FIGS. 3 and 4, the distance between the words 
is derived based on LD and the similarity is derived 
according to the following equation (1) . 

1 



SIMILARITY = 



LD' (A, B) + e 



LD' (A, B) = LD(A, B) 

len(A) + len(B) 

where len(A) and len(B) are functions expressing 

the lengths of the character strings, and LD' (A,B) 

10 indicates a normalized LD. Further, e may be a desired 

small real number, but in this example, e is set at "1". 
Then, when LD is set at the minimum value (that is, the 
word strings A and B are the same) , the similarity is 
set at the maximum value "1". 

15 Various words which are not present in the address 

data items Bl, B2 may exist in the document image. For 
example, "Multiple District C" is not a formal address, 
but indicates a block. Further, it may indicate the 
address name, the room number, or the name of a country. 

20 Among them, the name of the receiver (address name) 

such as "Williams" as in this example may become the 
same as the street name or city name in some cases. 

In the conventional method, since only the 
distance (similarity) is used, the street name of the 

25 first address data Bl and the name of the receiver 

(address name) may be erroneously collated in the 
example of collation for the word string A and first 
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address data Bl. In addition, since a character error 
between "Sheraton" and "SHEIATON" occurs in the 
character recognition process when the word string A 
and the second address data B2 are compared with each 
other, the result of comparison becomes worse than in 
the case of comparison between the word string A and 
the first address data when only the distance 
(similarity) is used, and as a result, erroneous 
recognition may occur. 

BRIEF SUMMARY OF THE INVENTION 

Accordingly, an object of this invention is to 
provide a word string collating apparatus and word 
string collating method capable of performing the 
highly precise word string collating process in 
comparison with the conventional case when an input 
word string and each word in the word dictionary are 
collated in the character recognition field. 

Another object of this invention is to provide an 
address recognition apparatus capable of recognizing a 
word string of an address at high precision based on a 
destination address constructed by a word string 
including words of an address name, receiver's name, 
honorific title (position title), zip code and the like. 

According to a first aspect of this invention, 
there is provided a word string collating apparatus for 
collating an input word string and words in a word 
dictionary when a partial word string is extracted from 
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the result of character recognition for a word string 
including a plurality of words, comprising 
correspondence setting means for variously setting 
correspondence relations between the words of the input 
word string and the words in the word dictionary 
according to the number of words of the extracted 
partial word string; deriving means for deriving each 
distance between the words which are set into the 
correspondence relation by the correspondence setting 
means based on each similarity between the words and 
deriving the positional relation of each word of the 
input word string which is set into the correspondence 
relation by the correspondence setting means; and 
determining means for deriving an evaluated value based 
on the positional relation derived by the deriving 
means and the distance between the words which are set 
into the correspondence relation by the correspondence 
setting means for each type of the correspondence 
relation set by the correspondence setting means and 
determining a partial word string extracted from the 
input word string based on the evaluated value. 

According to a second aspect of this invention, 
there is provided a word string collating apparatus for 
collating an input word string and words in a word 
dictionary when a partial word string is extracted from 
the result of character recognition for a word string 
including a plurality of words, comprising word string 
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forming means for forming a word string based on the 
result of character recognition for a word string 
including a plurality of words; correspondence setting 
means for variously setting a correspondence relation 
between each word of the word string formed by the word 
string forming means and each word in the word 
dictionary according to the number of words of the 
extracted partial word string; distance calculating 
means for deriving a distance between words based on a 
similarity between the words which are set into the 
correspondence relation by the correspondence setting 
means; positional relation deriving means for deriving 
a positional relation of each word of the formed word 
string which is set into the correspondence relation by 
the correspondence setting means; evaluated value 
deriving means for deriving an evaluated value based on 
the positional relation derived by the positional 
relation deriving means and the distance, derived by 
the distance calculating means, between the words which 
are set to correspond to each other by the 
correspondence setting means for each type of the 
correspondence relation set by the correspondence 
setting means; and determining means for determining a 
partial word string extracted from the formed word 
string based on the evaluated value derived by the 
evaluated value deriving means. 

According to a third aspect of this invention, 
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there is provided a word string collating apparatus for 
collating words of an input first word string including 
a plurality of words and words of each of third various 
word strings of a word dictionary when a second word 
string using part of the plurality of words of the 
first word string is extracted from the result of 
character recognition for the first word string, 
comprising character recognizing means for recognizing 
the first word string containing the second word string 
to be extracted in the unit of character; word 
extracting means for extracting characters recognized 
by the character recognizing means in the unit of word; 
and word string extracting means for collating the 
first word string including a plurality of words 
extracted by the word extracting means and the third 
various word strings of the word string dictionary, 
determining words of the second word string in the 
first word string respectively corresponding to the 
words of the third word string based on similarities 
between the words of the first word string and the 
words of the third word string, making evaluation for 
each of the third word strings based on the number of 
words between the words in the second word string thus 
determined and the similarities between the words of 
the third word string and the words of the second word 
string determined, and extracting one of the third word 
strings as the second word string. 
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According to a fourth aspect of this invention, 
there is provided an address recognition apparatus for 
recognizing an address written on a paper sheet, 
comprising character recognizing means for recognizing 
5 a word string containing an address word string written 

on the paper sheet in the unit of character; word 
extracting means for extracting characters recognized 
by the character recognizing means in the unit of word; 
an address word string dictionary for previously 

10 storing a plurality of first word strings each 

constructing an address in which a word arrangement 
order is determined; and address word string 
recognizing means for collating a second word string 
including a plurality of words extracted by the word 

15 extracting means and the first various word strings in 

the address word string dictionary, determining words 
of the second word string respectively corresponding to 
the words of the first word string based on 
similarities between the words of the first word string 

20 and the words of the second word string, making 

evaluation for each of the first word strings based on 
the number of words between the words in the second 
word string thus determined and the similarities 
between the words of the first word string and the 

25 words of the second word string determined, and 

recognizing one of the first word strings as the 
address word string. 
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According to a fifth aspect of this invention, 
there is provided an address recognition apparatus for 
recognizing an address written on a paper sheet, 
comprising character recognizing means for recognizing 
a word string containing an address word string written 
on the paper sheet in the unit of character; word 
extracting means for extracting characters recognized 
by the character recognizing means in the unit of word; 
an address word string dictionary for previously 
storing a plurality of first word strings each 
constructing an address in which a word arrangement 
order is determined; and address word string 
recognizing means for collating a second word string 
including a plurality of words extracted by the word 
extracting means and the first various word strings in 
the address word string dictionary, determining words 
of the second word string respectively corresponding to 
the words of the first word string based on the word 
arrangement order and similarities between the words of 
the first word string and the words of the second word 
string, making evaluation for each of the first word 
strings based on the number of words between the 
respective words in the second word string thus 
determined and the similarities between the words of 
the first word string and the words of the second word 
string determined, and recognizing one of the first 
word strings as the address word string. 
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Additional objects and advantages of the invention 
will be set forth in the description which follows, and 
in part will be obvious from the description, or may be 
learned by practice of the invention. The objects and 
advantages of the invention may be realized and 
obtained by means of the instrumentalities and 
combinations particularly pointed out hereinafter. 
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING 
The accompanying drawings, which are incorporated 
in and constitute a part of the specification, 
illustrate presently preferred embodiments of the 
invention, and together with the general description 
given above and the detailed description of the 
preferred embodiments given below, serve to explain the 
principles of the invention. 

FIG. 1 is a block diagram schematically showing 
the construction of the conventional word string 
collating apparatus; 

FIG. 2 is a view showing an example of a document 

image to be read; 

FIGS. 3 and 4 are schematic diagrams for 
illustrating a conventional address collating method; 

FIG. 5 is a block diagram schematically showing 
the construction of a word string collating apparatus 
according to a first embodiment of this invention; 

FIGS. 6 and 7 are diagrams for illustrating an 
example of links attained by use of a link initializing 
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section in the first embodiment; 

FIGS. 8 and 9 are diagrams for illustrating a 
process effected by the link initializing section in 
the first embodiment; 

FIG. 10 is a flowchart for illustrating the 
address collating process in the first embodiment; 

FIG. 11 is a diagram for illustrating a process 
effected by a link initializing section used in a 
second embodiment of this invention; and 

FIGS. 12 and 13 are diagrams for illustrating a 
process effected by a link initializing section used in 
a third embodiment of this invention. 

DETAILED DESCRIPTION OF THE INVENTION 

There will now be described embodiments of this 
invention with reference to the accompanying drawings. 

The first embodiment of this invention is 
explained. 

FIG. 5 schematically shows the construction of a 
word string collating apparatus according to the first 
embodiment of this invention. The word string 
collating apparatus includes an input section 1 used as 
input means, a character recognizing section 2 used as 
recognition means, a word string forming section 3 used 
as word string forming means, a link initializing 
section 4 used as correspondence setting means, a 
distance calculating section 5 used as distance 
calculating means, a phase deriving section 6 used as 
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positional relation deriving means, an evaluated value 
calculating section 7 used as evaluated value 
calculating means, a link updating section 8 used as 
updating means, an optimum solution deriving section 9 
used as optimum solution deriving means, an output 
section 10, a memory Ml for storing a word string A, an 
address dictionary M2 used as a word dictionary for 
storing a plurality of address data items (words) B, a 
memory M3 for storing a link C, a memory M4 for storing 
a distance Cld, anc ^ a memory M5 for storing an 
evaluated value C^ll. 

The above sections are explained in detail below. 

As shown in FIG. 2, when a document image of an 
address written on an envelop or the like is input, the 
input section 1 converts the document image into a form 
which a computer can process. 

It is constructed by a reading device, for example, 
a photoelectric converting device such as an image 
scanner. 

The character recognizing section 2 performs 
processes such as a binary coding process, segmentation 
process, individual character recognition process and 
the like for understanding the contents of the document 
image based on the document image input by use of the 
input section 1. In this specification, the 
recognition result obtained in the character 
recognizing section 2 is referred to as a character 
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recognition result. 

The word string forming section 3 forms a word 
string A based on the character recognition result of 
the character recognizing section 2 and stores the thus 
formed word string into the memory Ml . The word string 
A is a set of character strings segmented in the unit 
of word. For example, the word strings are segmented 
based on the length of a space between characters or 
the type of the character. 

For example, a word string A constructed by 15 
words of "JOHN", "WILLIAMS", "MULTIPLE", "DLSTRICT", 
"C", "1278", "SHEIATON", "STREEI", "UNLT", "5", 
"RICHRTIONDHILL" , "ONTARLO" , "L4B", "2N1", "CANADA" are 
formed as shown in FIGS. 6 and 7 based on the address 
of FIG. 2. 

In the address dictionary M2 used as the word 
dictionary, a plurality of address data items (words) 
Bl, B2, ••• are previously stored and desired data items 
can be read out at any time. 

For example, as shown in FIG. 6, the address data 
Bl including six word items of the street name 
"WILLIAMS", street suffix "STREET", city name 
"RICHMONDHILL", state name "ONTARIO", zip code (upper 
three digits) "L4B" and zip code (lower three digits) 
"2N1" is read out. 

Further, as shown in FIG. 7, the address data B2 
including six word items of the street name "SHERATON", 
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street suffix "STREET", city name "RICHMONDHILL" , state 
name "ONTARIO", zip code (upper three digits) "L4B" and 
zip code (lower three digits) "2N1" is read out. 

The link initializing section 4 initializes a link 
C according to one address data item Bl, (B2, ■••) read 
out from the address memory M2 . In this case, each 
time the address data item Bl, B2, ■•■ read out, all of 
the links are re-set and the state is shown in FIGS. 6, 
and 7. The link C indicates the correspondence 
relation between the words of the word string A and the 
words of the address data item Bl, B2, ••• and the 
correspondence relation is changed (updated) at any 
time by the link updating section 8 which will be 
described later. 

One example of the link initialization is 
explained below. In this example, the number of links 
is set equal to the number of items of the address data 
items Bl, That is, the address data items Bl, •■■ 

each include six items corresponding to the street name, 
street suffix, city name, state name, zip code (upper 
three digits) , and zip code (lower three digits) . The 
link is expressed by two arrays. That is, the link is 
expressed by the following equation (2) . 

x M [i] = i 

(2) 

... i*(n - 1) 
xi [i] = 

0 

where i is an index number of the link which 
satisfies the following condition. 
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0 < i < 5 



(3) 



Further, n indicates the number of words of the 
word string A, x^ti] indicates the number of the item 
(which lies in the range of the street name (0) to the 
postal code (5) ) on the address data Bl, ••• side, and 
xi [i] indicates the number of the word (which lies in 
the range of the head (0) to the tail (n-1) ) on the 
word string A side. 

Thus, the link initialization results are shown in 
FIG. 6 and FIG. 7. FIG. 6 show the initialized link 
between the word string A and the address data Bl and 
FIG. 7 show the initialized link between the word 
string A and the address data B2 . Thus, the links are 
uniformly arranged. 

That is, as shown in FIG. 6, "JOHN" of the word 
string A and "WILLIAMS" of the address data Bl are 
linked, "DLSTRICT" of the word string A and "STREET" of 
the address data Bl are linked, "SHEIATON" of the word 
string A and "RICHMONDHILL" of the address data Bl are 
linked, "5" of the word string A and "ONTARIO" of the 
address data Bl are linked, "ONTARLO" of the word 
string A and "L4B" of the address data Bl are linked, 
and "CANADA" of the word string A and "2N1" of the 
address data Bl are linked. 

Further, as shown in FIG. 7, " JOHN" of the word 
string A and "SHERATON" of the address data B2 are 
linked, "DLSTRICT" of the word string A and "STREET" of 
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the address data B2 are linked, "SHEIATON" of the word 
string A and "RICHMONDHILL" of the address data B2 are 
linked, "5" of the word string A and "ONTARIO" of the 
address data B2 are linked, "ONTARLO" of the word 
string A and "L4B" of the address data B2 are linked, 
and "CANADA" of the word string A and "2N1" of the 
address data B2 are linked. 

The distance calculating section 5 calculates the 
distance C^q between the linked words of the word 
string A and address data Bl, (B2, ■•■) and stores the 
calculated distance in the memory M4 . The distance 
between the words can be variously defined and an 
edition distance (which is also called a Levenshtein 
distance and is hereinafter simply referred to as LD) 
is given as one example thereof. In this embodiment, 
the distance between the words is calculated based on 
LD and the similarity between the word string A and the 
address data Bl (B2, •••) is derived according to the 
following equation (4) . 

1 



SIMILARITY = 



LD' (A, B) + 6 



LD . (A , B ) = LD(A - B) (4) 

len(A) + len(B) 

where len ( • ) is a function expressing the length 

of the character string, and LD' ( • , • ) indicates 

normalized LD. Further, e may be a desired small real 

number, but in this example, e is set at "1". Then, 



c 



c 



- 21 - 

when LD is set at the minimum value (that is, the word 
string A and the address data Bl are the same), the 
similarity is set at the maximum value "1". 

In the distance calculating section 5, the above 
calculation is made for the words related by the links 
and the distances thus calculated are averaged. That 
is, a value indicated by the following equation (5) is 
derived and the thus derived value is stored in the 
memory M4 as the distance Clq. 

ClD 6 \ LD' (str(x M [i] ) , strati] ) ) + s 

(5) 

The phase deriving section 6 derives a phase 
difference between each link and a neighboring link 
according to the following equation (6). In this 
embodiment, in order to simplify the explanation, links 
before and after the corresponding link (at most two 
links) are used as the neighboring links. 

SIMILARITY = pM _ D^..^ + ( D £ i + 1 - D^ i + 1 ) 2 

D i i-i^onntXM^HMW > ' D^ i _ 1 *norm(x I [i-l]-xi[i] ) 

D^ i + 1 =norm(x M [i+l]-x M [i] ) , D^ i+1 =norm(x I [i+l]-x I [i] ) 

(6) 

where norm ( • ) is a normalization function 
corresponding to the number of words of the address 
data Bl (B2, — ) and word string A (in this case, the 
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number of words (the number of items) of the address 
data Bl (B2, •••) is fixed at six) . 

An example of the distance calculating section 5 
is explained for the case of an n-th link between the 
5 word string A and the address data Bl as shown in 

FIG. 8 and the case of an m-th link between the word 
string A and the address data B2 as shown in FIG. 9. 

In this case, as shown in FIG. 8, "WILLIAMS" of 
the word string A and "WILLIAMS" of the address data Bl 

10 are linked, "STREEI" of the word string A and "STREET" 

of the address data Bl are linked, "RICHRTIONDHILL" of 
the word string A and "RICHMONDHILL" of the address 
data Bl are linked, "ONTARLO" of the word string A and 
"ONTARIO" of the address data Bl are linked, "L4B" of 

15 the word string A and "L4B" of the address data Bl are 

linked, and "2N1" of the word string A and "2N1" of the 
address data Bl are linked. 

Therefore, the distance between the words of 
"WILLIAMS" of the word string A and "WILLIAMS" of the 

20 address data Bl which are set in the first linked state 

is set at "1.0". The distance between the words of 
"STREEI" of the word string A and "STREET" of the 
address data Bl which are set in the second linked 
state is set at "0.9". The distance between the words 

25 of "RICHRTIONDHILL" of the word string A and 

"RICHMONDHILL" of the address data Bl which are set in 
the third linked state is set at "0.9". The distance 
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between the words of "ONTARLO" of the word string A and 
"ONTARIO" of the address data Bl which are set in the 
fourth linked state is set at "0.9". The distance 
between the words of "L4B" of the word string A and 
"L4B" of the address data Bl which are set in the fifth 
linked state is set at "1.0". The distance between the 
words of "2N1" of the word string A and "2N1" of the 
address data Bl which are set in the sixth linked state 

is set at "1.0". 

As a result, the similarity (C LD ) between the word 
string A and the address data Bl in the n-th link is 
expressed as follows. 

C LD = 1/6 2(1.0 + 0.9 + 0.9 + 0.9 + 1.0 + 1.0) 

= 0.95 

Further, as shown in FIG. 9, "SHEIATON" of the 
word string A and "SHERATON" of the address data B2 are 
linked, "STREEI" of the word string A and "STREET" of 
the address data B2 are linked, "RICHRTIONDHILL" of the 
word string A and "RICHMONDHILL" of the address data B2 
are linked, "ONTARLO" of the word string A and 
"ONTARIO" of the address data B2 are linked, "L4B" of 
the word string A and "L4B" of the address data B2 are 
linked, and "2N1" of the word string A and "2N1" of the 
address data B2 are linked. 

Therefore, the distance between the words of 
"SHEIATON" of the word string A and "SHERATON" of the 
address data B2 which are set in the first linked state 
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is set at "0.9". The distance between the words of 
"STREEI" of the word string A and "STREET" of the 
address data B2 which are set in the second linked 
state is set at "0.9". The distance between the words 
of "RICHRTIONDHILL" of the word string A and 
"RICHMONDHILL" of the address data B2 which are set in 
the third linked state is set at "0.9". The distance 
between the words of "ONTARLO" of the word string A and 
"ONTARIO" of the address data B2 which are set in the 
fourth linked state is set at "0.9". The distance 
between the words of "L4B" of the word string A and 
"L4B" of the address data B2 which are set in the fifth 
linked state is set at "1.0". The distance between the 
words of "2N1" of the word string A and "2N1" of the 
address data B2 which are set in the sixth linked state 
is set at "1.0". 

As a result, the similarity (Cld) between the word 
string A and the address data B2 in the m-th link is 
expressed as follows. 

C LD = 1/6 2(0.9 + 0.9 + 0.9 + 0.9 + 1.0 + 1.0) 
= 0.93 

The phase deriving section 6 makes the above 
calculation for each link and the thus derived values 
are averaged. That is, a value expressed by the 
following equation (7) is derived and the thus derived 
value is stored into the memory M5 as a phase 
difference E. 
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(7) 

That is, the number "5" of words lying between 
"WILLIAMS" in the first linked position of the word 
string A and "STREEI" in the second linked position is 
used as the phase difference. The number "2" of words 
lying between "STREEI" in the second linked position of 
the word string A and "RICHRTIONDHILL" in the third 
linked position is used as the phase difference. The 
number "0" of words lying between "RICHRTIONDHILL" in 
the third linked position of the word string A and 
"ONTARIO" in the fourth linked position is used as the 
phase difference. The number "0" of words lying 
between "ONTARLO" in the fourth linked position of the 
word string A and "L4B" in the fifth linked position is 
used as the phase difference. The number "0" of words 
lying between "L4B" in the fifth linked position of the 
word string A and "2N1" in the sixth linked position is 
used as the phase difference. 

Thus, the phase difference C P is expressed as 

follows. 

Cp = 1/6 2(6-1)2 + (3-1)2 + (3-1)2 + (1-1)2 

+ (1-1)2 + (1-D2 + (1-1)2 + (1-D 
As a result, the phase difference (C P ) between the 
word string A and the address data Bl in the n-th link 

is set to "9.66". 

Further, the number "0" of words lying between 
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"SHERATON " in the first linked position of the word 
string A and "STREEI" in the second linked position is 
used as the phase difference. The number "2" of words 
lying between "STREEI" in the second linked position of 
5 the word string A and "RICHRTIONDHILL" in the third 

linked position is used as the phase difference. The 
number "0" of words lying between "RICHRTIONDHILL" in 
the third linked position of the word string A and 
"ONTARIO" in the fourth linked position is used as the 

10 phase difference. The number "0" of words lying 

between "ONTARLO" in the fourth linked position of the 
word string A and "L4B" in the fifth linked position is 
used as the phase difference. The number "0" of words 
lying between "L4B" in the fifth linked position of the 

15 word string A and "2N1" in the sixth linked position Is 

used as the phase difference. 

Thus, the phase difference Cp is expressed as 
follows . 

Cp = 1/62 (1-1)2 + (3-1)2 + (3-1)2 + (1-1)2 
20 + (1-1)2 + (1-1)2 + (1-1)2 + (1-1)2 + (1-1)2 

= 1.26 

As a result, the phase difference (Cp) between the 
word string A and the address data B2 in the m-th link 
is set to "1.26". 
25 The evaluated value calculating section 7 derives 

an evaluated value C^ll according to the following 
evaluation function based on the distance Clq and the 



phase difference Cp which are previously derived and 
stores the thus derived evaluated value C ALL into the 
memory M6. The derived evaluated value C ALL indicates 
that the better collation result is obtained as the 
value thereof is smaller. 

Call = c p " c ld 

The link updating section 8 updates the link C 
based on the derived evaluated value C ALL . That is, 
if it is detected that an evaluated value C ALL ' derived 
in the present cycle is smaller than the previous 
evaluated value C ALL when the evaluated value C ALL is 
compared with the evaluated value C ALL ' , the evaluated 
value C ALL ' is used instead of the previous evaluated 
value C ALL (hill-climbing method) . Then, part of the 
links is updated and the process starting from the 
process of the distance calculating section 5 is 
performed again. The process is repeatedly effected 
until it is stopped by the optimum solution deriving 
section 9 which will be described later. 

For example, in the case of the n-th link between 
the word string A and the address data Bl, the 
similarity (C LD ) =0.95 and the phase difference (Cp) = 
9.66 and the evaluated value (C ALL ) is derived as 
follows . 

Call = 9.66 - 0.95 

= 8.71 

Further, in the case of the m-th link between the 
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word string A and the address data B2, the similarity 
( C LD) = 0.93 and the phase difference (Cp) = 1.26 and 
the evaluated value (C^ll) is derived as follows. 

Call = 1-26 - 0.93 

= 0.34 

In this case, the evaluated value (C^ll) becomes 
minimum and is used as the optimum solution. 

The optimum solution deriving section 9 determines 
that transition of the evaluated value (C^ll) is 
sufficiently converged and repeatedly performs the 
process starting from the process of the link 
initializing section 4 for the next address data B2 . 
When the address data Bl B2, ••• to be compared is no 
more present, the repetition operation is terminated. 
Then, the smallest one of the evaluated values C&LL 
derived by comparison with a plurality of address data 
items Bl B2, ••• is selected and used as the optimum 
solution. 

That is, the address collation process in the 
first embodiment is schematically shown in FIGS. 8 and 
9. The similarity in the first address data Bl is 
larger when only the distance (similarity) is used, but 
the phase relation (phase difference) between 
"WILLIAMS" and "STREEI" in the character string A for 
the first address data Bl is bad and, in the evaluation 
standard in this embodiment, even if the characters do 
not completely coincide with each other as in 
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"SHERATON " and "SHEIATON" at the stage of the character 
recognition result, the phase relation with the 
adjacent link is taken into consideration and the 
second address data (correct solution) B2 is selected 
as the optimum solution. 

The output section 10 converts the optimum 
solution derived in the optimum solution deriving 
section 9 into a form which the user can understand and 
outputs the thus converted address data and is a 
display device, for example. 

With the above construction, the operation is 
explained with reference to the flowchart shown in 
FIG. 10. 

For example, a document image of an address or the 
like written on an envelop is input to the input 
section 1 (ST1) and supplied to the character 
recognizing section 2. Then, the character recognizing 
section 2 performs the character recognizing process 
based on the input document image (ST2) and supplies 
the character recognition result to the word string 
forming section 3. 

As a result, the word string forming section 3 
forms a word string A based on the character 
recognition result of the character recognizing section 
2 (ST3) and stores the word string into the memory Ml. 

The link initializing section 4 initializes a link 
C according to the word string A stored in the memory 
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Ml and one address data item Bl, (B2, -) read out from 
the address memory M2 (ST4) and stores the initialized 
link into the memory M3 . 

The distance calculating section 5 calculates a 

5 distance C LD between the linked words stored in the 

memory M3 based on the word string A stored in the 
memory Ml and one address data Bl, (B2, -) read out 
from the address dictionary M2 (ST5) and stores the 
calculated distance in the memory M4 . 
1Q The phase deriving section 6 makes the above 

calculation for each link and averages the thus 
obtained results (ST6) . That is, a value is derived 
based on the following equation (7) and the thus 
derived value is stored in the memory M5 as the phase 

15 difference (Cp) . 

cp - \ £ < o^i-i - °li-i> 2 + < D i,i + i - D Ui> 2 

i 

(7) 

The evaluated value calculating section 7 derives 
20 an evaluated value C ALL according to the above- 

described evaluation function based on the distance C LD 
previously derived and stored in the memory M4 and the 
phase difference Cp stored in the memory M5 (ST7) and 
stores the derived evaluated value C ALL into the memory 
25 M6. 

Call = c p _ c ld 

The link updating section 8 updates the link C 
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stored in the memory M3 based on the evaluated value 
C ALL thus derived and stored in the memory M6 (ST8) . 

Then, the link updating section 8 determines 
whether or not the updating process of the evaluated 
value C^ll by the hill-climbing method has been 
performed for all of the links (ST9) , and if the 
updating process is not terminated, the operation 
returns to the calculation process for the distance Cld 
between the words in the step ST5. 

If it is determined in the step ST9 that the 
updating process is terminated, whether the process for 
all of the to-be-processed address data items in the 
address dictionary M2 is terminated or not is 
determined (ST10) . If it is determined in the step 
ST10 that the process is not terminated, the operation 
is returned to the link initialization process of the 
step ST4, and if the process is terminated, the optimum 
solution deriving section 9 selects the smallest one of 
the evaluated values C^ll from the memory M6 which are 
obtained as the result of comparison with the plurality 
of address data items Bl B2, ••• in the memory M2 and 
outputs the selected evaluated value to the output 
section 10 as the optimum solution (ST11) . 

As a result, the output section 10 converts the 
optimum solution derived in the optimum solution 
deriving section 9 into a form which the user can 
understand and outputs the thus converted optimum 
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solution (ST12) . 

As described before, the address collation process 
in the first embodiment is schematically shown in 
FIGS. 8 and 9. The similarity in the first address 
data Bl is larger when only the distance (similarity) 
is used, but the phase relation (phase difference) 
between "WILLIAMS" and "STREEI" in the character string 
A for the first address data Bl is bad, and in the 
evaluation standard in this embodiment, even if the 
characters do not completely coincide with each other 
as in "SHERATON " and "SHEIATON " at the stage of the 
character recognition result, the phase relation with 
the adjacent link is taken into consideration, that is, 
a word close to the street suffix as "STREET" is taken 
into consideration, and the second address data 
(correct solution) B2 is selected as the optimum 
solution. 

The output section 10 converts the optimum 
solution derived in the optimum solution deriving 
section 9 into a form which the user can understand 
(from code data to display data) and outputs the thus 
converted data and is a display device, for example. 

Thus, according to the first embodiment, the 
address collation process with the extremely high 
precision can be effected by using a "link" expressing 
the connection (correspondence relation) between the 
word string and the address data and connecting the 
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characteristic portions which are locally similar to 
each other from the viewpoint of both of the phase 
(positional relation) and distance (similarity) between 
the words by use of the link at the address collation 
time. 

Next, a second embodiment of this invention is 
explained. 

In the first embodiment described above, the 
initial value of the link is determined by uniform 
distribution in the link initializing section 4, but in 
the second embodiment, a word having a minimum distance 
(maximum similarity) is temporarily derived for each 
address data item before initialization and the thus 
derived word is used as an initial value and the 
operation is shown in FIG. 11. 

Thus, according to the second embodiment, at the 
link initialization stage, a quasi-optimum solution can 
be derived, and since the optimum solution deriving 
process is started based on the thus derived value, the 
optimum solution searching time can be shortened. 

Next, a third embodiment of this invention is 
explained. 

In the first embodiment, the links are all re-set 
in the link initializing section 4 each time the 
address data Bl, ••• is read out, but in the third 
embodiment, if an item (word) which is common with 
address data previously collated is present, the result 
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is re-used as the initial value and the operation is 
shown in FIGS. 12 and 13. 

In this case, the link between "SHEIATON " of the 
word string A and "RICHMONDHILL" of the address data Bl, 
the link between "5" of the word string A and "ONTARIO" 
of the address data Bl, the link between "ONTARLO" of 
the word string A and "L4B" of the address data Bl, and 
the link between "CANADA" of the word string A and 
"2N1" of the address data Bl shown in FIG. 12 is re- 
used as initial values for the next address data B2 as 
shown in FIG. 13. 

Thus, according to the third embodiment, 
calculations for commonly usable (re-usable) potions in 
the optimum solution previously obtained can be omitted 
and the optimum solution searching time can be 
shortened. 

This invention is not limited to the above 
embodiments and can be variously modified without 
departing from the technical scope thereof. For 
example, the method for deriving the evaluated value, 
phase and distance between the words is not limited to 
the equations described above. Further, as the link 
updating method, various optimization methods which are 
well known in the art can be used other than the hill- 
climbing method as described above. 

Additional advantages and modifications will 
readily occur to those skilled in the art. Therefore, 
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the invention in its broader aspects is not limited to 
the specific details and representative embodiments 
shown and described herein. Accordingly, various 
modifications may be made without departing from the 
spirit or scope of the general inventive concept as 
defined by the appended claims and their equivalents. 



